Goto

Collaborating Authors

 amazon polly


Transforming Higher Education with AI-Powered Video Lectures

Zhang, Dengsheng

arXiv.org Artificial Intelligence

The integration of artificial intelligence (AI) into video lecture production has the potential to transform higher education by streamlining content creation and enhancing accessibility. This paper investigates a semi -automated workflow that combines Google Gemini for script generation, Amazon Polly for voice synthesis, and Microsoft PowerPoint for video assembly. Unlike fully automated text -to -video platforms, this hybrid approach preserves pedagogical intent while ensuring script -slide synchronization, narrative coherence, and customization. Case studies demonstrate the effectiveness of Gemini in generating accurate and context - sensitive scripts for visually rich academic presentations, while Polly provides natural - sounding narration with controllable pac ing. A two-course pilot study was conducted to evaluate AI -generated instructional videos (AIIV) against human instructional videos (HIV). Both qualitative and quantitative results indicate that AIIVs are comparable to HIVs in terms of learning outcomes, w ith students reporting high levels of clarity, coherence, and usability. However, limitations remain, particularly regarding audio quality and the absence of human - like avatars. The findings suggest that AI - assisted video production can reduce instructor workload, improve scalability, and deliver effective learning resources, while future improvements in synthetic voices and avatars may further enhance learner engagement.


Localize content into multiple languages using AWS machine learning services

#artificialintelligence

Over the last few years, online education platforms have seen an increase in adoption of and an uptick in demand for video-based learnings because it offers an effective medium to engage learners. To expand to international markets and address a culturally and linguistically diverse population, businesses are also looking at diversifying their learning offerings by localizing content into multiple languages. These businesses are looking for reliable and cost-effective ways to solve their localization use cases. Localizing content mainly includes translating original voices into new languages and adding visual aids such as subtitles. Traditionally, this process is cost-prohibitive, manual, and takes a lot of time, including working with localization specialists.


Break through language barriers with Amazon Transcribe, Amazon Translate, and Amazon Polly

#artificialintelligence

Imagine a surgeon taking video calls with patients across the globe without the need of a human translator. What if a fledgling startup could easily expand their product across borders and into new geographical markets by offering fluid, accurate, multilingual customer support and sales, all without the need of a live human translator? What happens to your business when you're no longer bound by language? It's common today to have virtual meetings with international teams and customers that speak many different languages. Whether they're internal or external meetings, meaning often gets lost in complex discussions and you may encounter language barriers that prevent you from being as effective as you could be.


Enable conversational chatbots for telephony using Amazon Lex and the Amazon Chime SDK

#artificialintelligence

Conversational AI can deliver powerful, automated, interactive experiences through voice and text. Amazon Lex is a service that combines automatic speech recognition and natural language understanding technologies, so you can build these sophisticated conversational experiences. A common application of conversational AI is found in contact centers: self-service virtual agents. We're excited to announce that you can now use Amazon Chime SDK Public Switched Telephone Network (PSTN) audio to enable conversational self-service applications to reduce call resolution times and automate informational responses. The Amazon Chime SDK is a set of real-time communications components that developers can use to add audio, messaging, video, and screen-sharing to your web and mobile applications.


Enghouse EspialTV enables TV accessibility with Amazon Polly

#artificialintelligence

This is a guest post by Mick McCluskey, the VP of Product Management at Enghouse EspialTV. Enghouse provides software solutions that power digital transformation for communications service operators. EspialTV is an Enghouse SaaS solution that transforms the delivery of TV services for these operators across Set Top Boxes (STBs), media players, and mobile devices. A large audience of consumers use TV services, and several of these groups may have disabilities that make it more difficult for them to access these services. To ensure that TV services are accessible to the broadest possible audience, we need to consider accessibility as a key element of the user experience (UX) for the service.


Giving your content a voice with the Newscaster speaking style from Amazon Polly

#artificialintelligence

Audio content consumption has grown exponentially in the past few years. Statista reports that podcast ad revenue will exceed a billion dollars in 2021. For the publishing industry and content providers, providing audio as an alternative option to reading could improve engagement with users and be an incremental revenue stream. Given the shift in customer trends to audio consumption, Amazon Polly launched a new speaking style focusing on the publishing industry: the Newscaster speaking style. This post discusses how the Newscaster voice was built and how you can use the Newscaster voice with your content in a few simple steps.


Build a unique Brand Voice with Amazon Polly Amazon Web Services

#artificialintelligence

AWS is pleased to announce a new feature in Amazon Polly called Brand Voice, a capability in which you can work with the Amazon Polly team of AI research scientists and linguists to build an exclusive, high-quality, Neural Text-to-Speech (NTTS) voice that represents your brand's persona. Brand Voice allows you to differentiate your brand by incorporating a unique vocal identity into your products and services. Amazon Polly has been working with Kentucky Fried Chicken (KFC) Canada and National Australia Bank (NAB) to create two unique Brand Voices, using the same deep learning technology that powers the voice of Alexa. The Amazon Polly team has built a voice for KFC Canada in a Southern US English accent for the iconic Colonel Sanders to voice KFC's latest Alexa skill. The voice-activated skill available through any Alexa-enabled Amazon device allows KFC lovers in Canada to chat all things chicken with Colonel Sanders himself, including re-ordering their favorite KFC.


Using AWS AI services and custom ML models to power your web applications

#artificialintelligence

This months meetup is all about using AWS AI services and custom Machine Learning models to power your web applications! Mike Apted, Startup Solutions Architect with Amazon Web Services, is back presenting for this months meetup! RSVP ASAP and we'll see you there! Agenda: 6:00pm - Arrival, mingling, pizza eating 6:20pm - Welcome & Introductions 6:30pm - Presentation Begins 7:20pm - Q&A and Open Group Discussions 8:00pm - Event concludes Presentation Title: Using AWS AI services and custom ML models to power your web applications Presentation Summary: In this session, we will look at how you can build a brand new web application to do speech to text generation, translate text, gain insights from text, convert text to speech, and to detect objects via the Amazon Transcribe, Amazon Translate, Amazon Comprehend, Amazon Polly and Amazon Rekognition respectively. We will then use Amazon SageMaker to label, train and deploy our own model against which we will make predictions from the web application.


Amazon's Text-To-Speech AI Service Sounds More Natural And Realistic

#artificialintelligence

Amazon enhanced Polly - the cloud-based text-to-speech service - to deliver natural and realistic speech synthesis. The service can now be leveraged to present domain-specific style such as newscast and sportscast. Though text-to-speech existed for more than two decades, it is never used in mainstream media due to the lack of natural and realistic modulation. Except for automated announcements that read out from existing datastores, the technology never replaced human voice and speech. Thanks to the advancements in AI, text-to-speech has evolved to become more natural and realistic to an extent that it may be hard to distinguish it from a human voice.


Text-to-Speech with Amazon Polly

#artificialintelligence

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Amazon Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries.